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Abstract 

A major problem of causal inference is the arrangement of dependent nodes in 
a directed acyclic graph (DAG) with path coefficients and observed confounders. 
Path coefficients do not provide the units to measure the strength of information 
flowing from one node to the other. Here we proposed the method of causal struc¬ 
ture learning using collider v-structures (CVS) with Negative Percentage Mapping 
(NPM) to get selective thresholds of information strength, to direct the edges and 
subjective confounders in a DAG. The NPM is used to scale the strength of in¬ 
formation passed through nodes in units of percentage from interval [0,1]. The 
causal structures are constructed by bottom up approach using path coefficients, 
causal directions and confounders, derived implementing collider v-structure and 
NPM. The method is self-sufficient to observe all the latent confounders present 
in the causal model and capable of detecting every responsible causal direction. 
The results are tested for simulated datasets of non-Gaussian distributions and 
compared with DirectLiNGAM and ICA-LiNGAM to check efficiency of the pro¬ 
posed method. 


1 Background Study: Markov Model, d-separable, v-structures 

In Bayesian Learning the Markov properties play an important role, which set the criteria for struc¬ 
ture analysis. The Markov model is a directed graph which does not contains any bi-directed edges 
[5]. This criteria does not provides the acyclicity of the model. But using markov model we can 
find the conditional dependent and independent nodes, provided we have information on prior ob¬ 
servations. Sometime it happens to be very difficult to know about prior observations or any of 
the prior information. Assuming the observed variables are prior, the later ones will be condition¬ 
ally dependent on the prior ones. Now we need a criteria to observe the posterior effects. In this 
scenario d-separable provides the criteria of sub-structural learning for conditional dependence and 
independence of the features. To implement d-separable we need v-structures. 

The primary definition of v-structure is provided for the undirected graphical models, where two 
nodes are only connected through an undirected edge [3]. The v-structures in a markov model are 
the sub-structures where each three of the observed nodes are only connected through two directed 
edges. Eor the choice of any three nodes we can permute them in three possible ways using two 


1 


directed edges irrespective of their positions. The d-separable criteria is used to construct a set of 
nodes such that conditioning on which, the paths/edges can be separated between other two nodes 
in the v-structure. Consider v-structures ofa—>6—)'C,a^6—>c and a —>■ b ^ c. In first 
and second types a and c can be d-separated by observing/conditioning on b, while in the third one 
conditioning on b makes a and c becomes dependent on each other as ancestor nodes of b remains 
unobserved.. The third type of v-structure is called a collider. 

He and Geng [2] applied intervention technique on markov equivalence classes to produce subgraphs 
with directed edges. They generated v-structures of subgraphs with markov properties to find direc¬ 
tions, using conditional probabilities. Markov Blankets can reveal the directions in the nodes based 
on their relevance of connection inside the blanket. The structural learning of causal model can be 
shown through faithfulness and relevance of connections using d-separable and v-structures [6]. The 
faithfulness for markov equivalent classes are beneficial and weak or strong faithful criteria can be 
derived for Gaussian and uniform distributions, where path coefficients and errors are drawn from 
intervals [—1,1] and [0,1] respectively [9]. In following Sections 2 and 3, we provided complete 
explanation for using collider v-structures as a substructural model and NPM for measuring strength 
of path coefficients. 


2 Proposed Structural Learning Method 


Every orthodox and modern approach of causal construction lag to define the measure for informa¬ 
tion passing through directed edges, while path coefficient/connection strength is the information 
passed on. Depending on the information (path coefficient) either from nodes or external con- 
founders we cannot conclude the directions in causal model. This problem becomes more compli¬ 
cated when it comes to multivariate structure. So we need a measure for these entropies through 
which we will be able conclude our directions. 

Before scaling the flow of information, we need a model to work on for causal constructions. The 
easiest way to build a larger model is, first to build the sub-structures of the bigger model and then 
arrange these small ones to complete the required structure. The directed acyclic causal graph can 
be studied as a decomposition of multiple v-structures under the lights of markov model [10]. In our 
case we used a bottom up approach of arranging multiple v-structures to form a causal DAG. For 
this type of approach we don’t need any prior information regarding the structure or contributing 
variables. After deriving all the direction, path coefficients and confounders for the considered v- 
structures, these can be combined together to provide a complete causal structure. The model of 
collider v-structure provides the necessary sub-structural analysis in a very detailed way. 


2.1 Collider v-structure with NPM (CVS with NPM): 

To analyze a larger causal structure in a smaller sub-structural level consider the v-structure as defied 
in the markov model, that is a set of three nodes connected through two directed edges. The collider 
is a v-structure where two extreme nodes have directed edges towards the middle node. We referred 
it as collider v-structure whenever an external confounder is present or added to the collider system 
and we modeled our algorithm suitable to analyze these types of DAG. Below Figure 1 provides the 
collider v-structures. 



(a) (b) (c) 


Figure 1: Collider v-structure 
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Consider the collider v-structures shown in Figure 1(a). A basic collider system where Y and Z 
collides with X. We referred the extreme nodes Y and Z as colliders and X as collide. The values 
Cy, Cz are path coefficients of T, z respectively and Cx is the error/confounder added to node X. 
Figure 1(b) and 1(c) are the permutations of nodes shown in Figure 1(a). Now Figure 1(a) can be 
represented as X = {cy)Y + {cz)Z + tx which is a linear equation in multivariate case. Now 
following points provide better explanation for choice of collide v-structure, 

• Collider v-structures are small and easy to infer. The complete inference technique is de¬ 
scribed in details latter in the Section 3. 

• In collider state the two extreme nodes become conditionally dependent whenever we ob¬ 
served or conditioned the middle node. So if we will compute the middle node as the 
combination of two extreme nodes and additive noise (confounder), then the collider v- 
structure is itself becomes a causal model which holds all properties of a complete causal 
structure. 

• Inference using successive permutations on selected nodes in the collider v-structure pro¬ 
vides easier access to conditional dependence or independence of all three nodes. Solving 
for coefficients of collider nodes and confounder, the estimations for directions can be made 
using a threshold criteria. 

To direct the edges in a collider v-structure based on their contribution through path coefficients, 
we introduced the negative percentage mapping (NPM) technique. NPM provides the percentage 
contributions of path coefficients of collider nodes and confounders in the collider v-structure to¬ 
wards the middle node (collide). In Figure 1(c), we know the path coefficients for X and Y towards 
Z. But we dont know the percentage contributions of CxX, CyY and towards Z, that is how 
strongly these values are contributed toward the formation of Z. Let’s dig a bit more to understand 
what percentage contribution is. Consider a linear model c = a(x) + b given by a simple numeric 
representation of 10 = 2.37(5.45) — 2.92. Now if we compute 10 as total of 100%, then percentage 
contribution of (2.37(5.45), —2.92) are (129.2%, —29.2%) respectively. The percentage contribu¬ 
tions above 100% are referred as over margin percentage and negative contributions are referred as 
negative percentages. Now the problem arises with —29.2% which either mean that —2.92 has no 
contribution towards 10 or contributes as negative percentage. Next question is how contribution 
becomes greater than the total estimation itself as in case of 2.37(5.45). In real world the negative 
and over margin contribution does not make any sense as they must have some contribution towards 
the formation of total percentage. This is where we need the NPM technique to map negative and 
over margin percentages to some form of positive contribution under the margin of 100%. Following 
points are discussed why NPM is required, 

• Negative percentage mapping provides a way to map the negative percentages and over 
margin percentage contributed towards the margin of 100%. 

• Using NPM, we can optimize the path coefficients of collider nodes to find which one 
provides the best estimation of direction for the choice of collider v-structure. 

• The higher NPM values confirm strong existence of directions in the v-structure. In compu¬ 
tational process of NPM we maximized for the effect of path coefficients on collide node. 
Therefore the directions produced in the nodes provide best estimations. 

Now the question is why we need to maximize percentage contributions for path coefficients and are 
we minimizing the contributions of confounders. In collider v-structure our main goal is to find the 
directions in nodes, irrespective of the larger or smaller contributions of errors. We constructed our 
model such a way that maximization of path coefficient do not implies the necessary minimization 
for errors/confounders. 

In Figure 1, (a), (b) and (c) are permutations of nodes X, Y and Z. So why we need the permutations 
over nodes to find the directions in X, Y and Z. Let’s see the following points, 

1. In Figure 1 (c), X and Y have some fixed contribution towards Z irrespective of their 
coefficients. Let the contributions be Xp and yp for X and Y respectively. 

2. The coefficients Cx and Cy of X and Y, either increase or deceases the contributions Xp of 
X and yp of Y towards Z. This increment or decrement depend on factors like error and 
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others nodes. So contribution of c^X depends on contribution of CyY and toward the 
formation of Z. 

3. Using above criteria, we can observe how the contribution between two nodes can be max¬ 
imized under the influence of other nodes. This can be done by checking the contributions 
in two particular nodes towards each other at presence of other nodes. This process is called 
the permutation. 

In the next Section we provided a complete mathematical model for collider v-structure and NPM 
for experimental use. 


3 Mathematical model: Collider v-structure, NPM 


To provide a mathematical model, we need to simplify the causal structures shown in Figure 1. We 
also need a generalization for the possible permutations in the collider v-structure. 


3.1 Model for Collider v-structure: 

Consider a set of observationsUofm x n matrix, where m is the number of instances and n is the 
number of variables or observations. Now draw any three set of observations {i,j, k) to construct 
the collider v-structure with fully observed confounders. Below equation is a generalized form of 
linear model of collider v-structure. 




Vi — Cji 2^ Vj 


2=1 






^ki / ^ 


ei 


( 1 ) 


In equation 1, Vi is the middle collide and Vj,Vk are the extreme colliders, {cji,Cki) represents 
path coefficients of {vj,Vk) respectively while the subscripts in coefficients represents the 

directions from (j —>■ i, k —>■ i) respectively and Cj is the confounder added to Vi. Now the required 
permutations of nodes in equation 1 can be given as. 
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Equation 2 can be solved using multivariate least square regression technique by taking derivative 
over required parameters and equating them to zero. In equation 2, the parameters are path coef¬ 
ficients and errors. Solving for three linear models given in equation 2, we can find the following 
matrices. 
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In equation 3, matrices shown in (i), (ii) and (iii) provides parameters of equations 2 when solved 
for variables Vi^ Vj and respectively. The results of equation 3 can be written as 
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Now if any of the path coefficients in equation 4 are zero then we can directly conclude that there 
exist no direction in the observed nodes. But, if all the estimated parameters in matrix are nonzero, 
then the problem arises for selecting the best parameters estimated for path coefficients producing 
directions in nodes and the confounders. So we need a threshold criteria for parameter selection, 
above which we can conclude the directions in nodes. This threshold is computed as the percentage 
contributions of colliders and confounders toward the collide node. As discussed in Section 2, the 
general computation for percentage contribution does not provide scope for negative and over margin 
percentages. In the following paragraphs we provided the proposed method for negative percentage 
mapping (NPM). 


3.2 Negative Percentage Mapping (NPM): 

To compute the thresholds for selection of best possible directions in the collider v-structure, we 
need to find the percentage contribution for each of the observed path coefficients and confounders 
estimated in equation 3. 

Theorem 1 Consider the linear model for two variable set {x, y) C ft as y = ax + b, where 
y estimated as a function of x and a, b are estimation parameters to fit the liner model. The 
percentage contribution (%^) for (ax,b) G 3? is (ax,b)%‘^ = (ax,b)/y always holds, for each 
{(ax)%°, (5)%'^} e [0,1] such that (ax)%‘^ + {b)%‘^ = 1. If {ax)%’^ + (6)%'^ = 1 for any 
{(ax)%°, (5)%'^} G S, where set S contains the elements from {[—oo, —1], [—1,0], [0,1], [l,oo]} 
then for every over margin percentage there exist a negative percentage for wich (ax)%‘^ + (b)%^ = 
1 holds. These percentages can be mapped to interval [0,1] by negative percentage mapping defined 
as 

iax,br%^ = {|(aa:)%^|, |(&)%"|}/{|(ax)%"| + |(6)%"|}. 

The above idea can be extended to a multivariate vector space. Now why it’s called a negative 
percentage mapping? Consider the case of three variable linear model defined in equation 1. The 
parameters {cji, Cki, ef) can take any values from set S, where each parameter can attain any value 
from 4 intervals sets of S. An observable fact is that over margin percentage implies existance 
negative percentage irrespective of their signs (+,-). So if we will observe all the negative percentage 
contributions, then through negative percentage mapping we can easily compute all the parameter 
contributions in [0,1] interval. So we called it a negative percentage mapping. 

Using theorem 1 in equation 4, the conclusions are drawn on path coefficients and confounder re¬ 
sponsible for causal directions. Threshold criteria for selection of directions are discussed in follow¬ 
ing points. 

1. Find the %‘^ for parameters in equation 4. Compare %‘^ of {cij,Cji), {cik, Cki), {cjk, Ckj) 
and perform NPM. Select highest percentage for the observed coefficient sets. 

2. In equation 2, it is not possible to have(cij%'^ = Cji%) , = Cki%’^), {cjk%’^ = 

Ckj%T). If it happens then discard the values as it create cycles in causal structure. 

Using the collider v-structure and NPM, a complete causal DAG is fully traceable with completely 
observed set of confounders. In next Section we discussed the experimental set ups for randomizes 
experiment and implementable algorithm. 


4 Experimental Setup 

For the best machine implementation of our proposed method an optimized and fully randomized ex¬ 
perimental setup is used. The optimized algorithm for causal inference using collide v-structure and 
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NPM takes an matrix desired for causal inference and provides four matrices of strength, direction, 
error matrix and the percentage contribution matrix. For algorithm design following assumptions 
are made, 

1. Input is a matrix of m x n elements, where m is the number of successive observations 
and n is the number of variables. 

2. Construct a matrix C, where each column of C contains a randomized set of 3 combinations 
from n variables. Constructing collider v-structure for all sets of nodal and confounder 
combination, we certainly make assured that there exist no unobserved confounder and 
causal directions in the nodes. 

3. Set the threshold to the highest %'^ value observed for a set of nodes in current state when 
compared to previously observed %'^ values. Strength, direction, error and matrices are 
successively updated based on threshold condition. 

4. Output matrices of strength and error contain the path coefficients and confounder val¬ 
ues for which are highest. Direction matrix saves a count for directions in the nodes 
observed through threshold condition and the matrix %° records the highest %° values re¬ 
sponsible for directions in any two sets of nodes, whenever threshold values get updated. 
The i, j value for strength and error shows the strength from node i to j and in error matrix 
the column j shows the cumulative error sums at node j. 

A complete Implementable algorithm is given below for causal structure learning using CVS with 
NPM. 

Input: A matrix V of size m x n 

Output: ST RN, DRCT, PC NT matrices each of size n x n and ERR is a matrix of size 1 x n 

Construct matrix C of size 3 x (g) ; 

Initialize all output matrices to Zero ; 
for i 1 to combinations in C do 

Construct R by randomly selecting colunms from C ; 

Solve for Parameter sets iCjj ^jki ? 

Store path coefficients in the matrix PARAM and errors in ERS ; 

Compute the %° of matrix PARAM, ERS ; 

if any of (PARAM or ERS < 0) then 

I Compute matirx by applying NPM on PARAMhERS 

end 

for p, q: rows & columns in R do 

if ofPARAM[p,q] > PCNT[p,q] then 

Set a count for directions and Store it in DRCT ; 

Update STRN, ERRkPCNT using PARAM, ERSk^%f 

end 

end 

for (r, c) in rows & columns of PCNT do 
i{PCNT[r,c] > PCNT[c,r] then 
I [c, r] of STRN, DRCT, ERR, PCNT = 0 ; 

end 

end 

return STRN, DRCT, ERR, PCNT-, 

end 


Algorithm 1: Causal structure learning using collider v-structure and NPM 


The Experiment is designed and carried out in R environment. The Computational complexity 
of the algorithm is 0{nC3 + rf). Which makes this method faster than other methods having a 
computational complexity of O(n^). Due to short computation time this method can be used to 
analyze large datasets. 
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5 Simulations 


For simulation we produced synthetic datasets to check the performance of our proposed model. 
The proposed model is duly tested for data of type Gaussian, non-Gaussian and noisy. Bach and 
Jordan [1] provides the framework to construct the sparse matrices with known probabilities. The 
non-Gaussianity of data is more useful than Gaussian for identification of causal DAGS [7]. In our 
case data is synthesized using following properties 

1. Construct 4 dataset with m = 500,1500,2000,3000 number of instances and n = 
10, 25, 35,45 number of variables. 

2. For each n number of variable set, draw 40% of i.i.d samples from Gaussian and non- 
Gaussian distributions of the types (1) Uniform, (2) Exponential, (3) Normal, (4) Log¬ 
normal, (5) Laplace, (6) Student- t with 3 degree freedom, (7) Student t with 5 degree 
freedom, (8)-(28) Symmetric mixtures of any two types from (1) to (7), (29)-(49) Non- 
symmetric mixture of any two types from (1) to (7). 

3. Now remaining 60% of the samples for each n variables is drawn from random mixing 
of different distributions from (1) to (49) in a linear model as defined in equation 1. This 
results in a noisy dataset with correlated error. 

4. Lor linear mixture model, the mixing coefficients are randomly drawn from intervals 
[—5, —0.5] U [0.5, 3] and confounders are drawn from interval [—2, 3]. 

The model is tested for effective computational time with ICA-LiNGAM [7] and DirectLiNGAM 
[8] methods. The system used for simulation process is a dual core i3 variant. In comparison test the 
choice of environments are different (environments of ICA-LiNGM, DirectLiNGAM is MATLAB 
and CVS with NPM is R), so true efficiency is not available. Table 1 provides the results on time 
consumed by methods for different number of sample sets. 

Table 1: Computational times of CVS with NPM, ICA-LiNGAM and 
DirectLiNGAM methods for different i.i.d sample sets 


Methods 

Samples 

500 

1500 

2000 

3000 

CVS with NPM 

10 

0.26 s 

0.31 s 

0.33 s 

0.34 s 


25 

4.34 s 

4.34 s 

4.38 s 

4.47 s 


35 

12.49 s 

12.66 s 

13.06 s 

13.14 s 


45 

26.64 s 

27.03 s 

26.35 s 

26.62 s 

ICA-LiNGAM 

10 

0.62 s 

0.64 s 

0.66 s 

0.69 s 


25 

1.98 s 

2.27 s 

2.68 s 

3.94 s 


35 

4.18 s 

4.52 s 

4.65 s 

4.499 s 


45 

5.27 s 

7.55 s 

7.23 s 

10.35 s 

DirectLiNGAM 

10 

1.03 min 

9.49 min 

14.27 min 

25.40 min 


In our simulation test ICA-LiNGAM and DirectLiNGAM methods failed for noisy dataset, drawn 
from linear mixture model. The datasets compared in this simulation are i.i.d samples drawn from 
non-Gaussian distribution types from (1) to (49). Lor smaller datasets CVS with NPM method out 
performed ICA-LiNGAM, but ICA-LiNGAM performed better for larger sets. In all test sets CVS 
with NPM over shadowed DirectLiNGAM from the begining, so we did not provided the rest of the 
results for DirectLiNGAN. 




Ligure 2: DAGs for 5 and 8 nodes using CVS with NPM 
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For estimation of causal DAGs, we used i.i.d samples generated using non-Gaussian distributions. 
Results for 5 and 8 number of node sets are used to compare the causal DAGs formed by CVS 
with NPM and DirectLiNGAM. For both methods the causal DAGs are formed using the strength 
matrices and confounders in the DAG are added from error matrix. The generated causal DAGs are 
shown in Figures 2 and 3.In Figure 3 (a), the constructed model using DirectLiNGAM formed a 
bi-directed edge from 1 and 3. The strength and errors matrices used in this model are supplied in 
the Supplementary material. 




Figure 3: Causal models using DirectLiNGAM for 5 and 8 no of nodes 


The performance of our method for real world data is compared with DLiNGAM. We used the 
regression, multivariate and categorical dataset of Boston Housing, available in UCI Repository 
[4]. Dataset contains 14 variables and 506 instances. For causal construction we removed the third 
variable from the dataset, as it only contains the binary values. The causal DAG for 13 nodes of 
Boston data is shown in Figure 4. 



(a) Using CVS with NPM 


4 ) 



(b) Using DirectLiNGAM 


Figure 4; Causal structured for Boston Housing Dataset using CVS with NPM and DirectLiNGAM 


As shown in Figure 4, CVS with NPM always produces a DAG whether the data is noisy or non- 
Gaussian, so acyclic condition always holds. 

6 Conclusion 

Our method provides a new approach to causal inference and structural learning focusing to measure 
the strength of informations passed in the nodes. Proposed model provides solutions for how much 
information is passed from parent node to child and how much a child node is affected by exter¬ 
nal confounders, which were remains unsolved till date. The novelty of the method is to provide 
measures for path coefficient in units of percentage contributions from a rang of [0,1]. The model 
provides strong criteria for directed acyclic causal graph formation as shown in above figures. The 
causal DAG estimation capability is fully verified in synthetic and real world conditions in compar¬ 
ison to other methods and results shown in Section 5. The method is very useful in noisy datasets to 
produce causal DAGs. More details about the noisy data results and reversible system is provided 
in supplementary material. A complete R code is available for further utilization and development 
and all the dataset used is provided in the supplementary material. The results of the proposed 
method can be used to provide an image tool for object oriented plotting of causal DAGs, which is 
a future perspective of this proposal. The major problem of this future work is the arrangement of 
multivariate nodes for bigger models. The NPM may prove it’s usefulness in broad areas of studies 



and collider v-structures can be used with other Bayesian methods for structure estimations of types 
non-confounder and noisy systems. 
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(a) A subfigure 



(b) A subfigure 


Figure 1: A figure with two subfigures 



Figure 2: A figure 



Figure 3: Another figure 
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